Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Closed high utility quantitative itemset mining algorithm on incremental data
Zhihui SHAN, Meng HAN, Qiang HAN
Journal of Computer Applications    2023, 43 (7): 2049-2056.   DOI: 10.11772/j.issn.1001-9081.2022091333
Abstract158)   HTML3)    PDF (2376KB)(128)       Save

High Utility Itemset (HUI) mining can provide information about the combination of highly profitable items in a dataset, which is useful for developing effective marketing strategies in real-world applications. However, HUIs only provide the itemsets and their total utility, not the purchased numbers of individual items, and the numbers of items in a real scenarios provide more precise accurate information. Therefore, High Utility Quantitative Itemset (HUQI) mining algorithms have been proposed by researchers. Focusing on the issue that the current HUQI mining algorithms can only process static data and have the problem of redundant resultsets, an incrementally updated quantitative utility list structure was proposed for storing and updating the utility information of items in the dataset, and based on this structure, an algorithm for mining Closed High Utility Quantitative Itemset (CHUQI) was proposed. The time and memory consumption of the proposed algorithm was compared with that of Faster High Utility Quantitative Itemset Miner (FHUQI-Miner) algorithm in terms of the number of result sets, minimum utility threshold, number of batches, and scalability. Experimental results show that the proposed algorithm can process incremental data effectively and mine more interesting itemsets.

Table and Figures | Reference | Related Articles | Metrics
Survey of high utility itemset mining methods based on intelligent optimization algorithm
Zhihui GAO, Meng HAN, Shujuan LIU, Ang LI, Dongliang MU
Journal of Computer Applications    2023, 43 (6): 1676-1686.   DOI: 10.11772/j.issn.1001-9081.2022060865
Abstract340)   HTML20)    PDF (1951KB)(199)       Save

High Utility Itemsets Mining (HUIM) is able to mine the items with high significance from transaction database, thus helping users to make better decisions. In view of the fact that the application of intelligent optimization algorithms can significantly improve the mining efficiency of high utility itemsets in massive data, a survey of intelligent optimization algorithm-based HUIM methods was presented. Firstly, detailed analysis and summary of the intelligent optimization algorithm-based HUIM methods were performed from three aspects: swarm intelligence optimization-based, evolution-based and other intelligent optimization algorithms-based methods. Meanwhile, the Particle Swarm Optimization (PSO)-based HUIM methods were sorted out in detail from the aspect of particle update methods, including traditional update strategy-based, sigmoid function-based, greedy-based, roulette-based and ensemble-based methods. Additionally, the swarm intelligence optimization algorithm-based HUIM methods were compared and analyzed from the perspectives of population update methods, comparison algorithms, parameter settings, advantages and disadvantages, etc. Next, the evolution-based HUIM methods were summarized and outlined in terms of both genetic and bionic aspects. Finally, the next research directions were proposed for the problems of the existing intelligent optimization algorithm-based HUIM methods.

Table and Figures | Reference | Related Articles | Metrics
Overview of classification methods for complex data streams with concept drift
Dongliang MU, Meng HAN, Ang LI, Shujuan LIU, Zhihui GAO
Journal of Computer Applications    2023, 43 (6): 1664-1675.   DOI: 10.11772/j.issn.1001-9081.2022060881
Abstract439)   HTML30)    PDF (1939KB)(272)       Save

The traditional classifiers are difficult to cope with the challenges of complex types of data streams with concept drift, and the obtained classification results are often unsatisfactory. Aiming at the methods of dealing with concept drift in different types of data streams, classification methods for complex data streams with concept drift were summarized from four aspects: imbalance, concept evolution, multi-label and noise-containing. Firstly, classification methods of four aspects were introduced and analyzed: block-based and online-based learning approaches for classifying imbalanced concept drift data streams, clustering-based and model-based learning approaches for classifying concept evolution concept drift data streams, problem transformation-based and algorithm adaptation-based learning approaches for classifying multi-label concept drift data streams and noisy concept drift data streams. Then, the experimental results and performance metrics of the mentioned concept drift complex data stream classification methods were compared and analyzed in detail. Finally, the shortcomings of the existing methods and the next research directions were given.

Table and Figures | Reference | Related Articles | Metrics
Multi-stage weighted concept drift detection method
Zhiqiang CHEN, Meng HAN, Hongxin WU, Muhang LI, Xilong ZHANG
Journal of Computer Applications    2023, 43 (3): 776-784.   DOI: 10.11772/j.issn.1001-9081.2022020231
Abstract240)   HTML4)    PDF (2112KB)(118)       Save

Aiming at the problem of the existing drift detection methods in balancing the detection delay, false positives, false negatives, and spatiotemporal efficiency, a new stage transition threshold parameter was proposed, and a multi-stage weighting mechanism including “stable stage-warning stage-drift stage” was introduced in the concept drift detection to weight the instances in stages, and the mechanism was applied to the double sliding window. Then a Multi-Stage weighted Drift Detection Method (MSDDM) based on Hoeffding inequality was proposed. On artificial datasets, MSDDM detected abrupt and gradual concept drift faster than Fast Hoeffding Drift Detection Method (FHDDM), Drift Detection Method based on Hoeffding’s bound (HDDM) and other drift detection methods, while maintained a low false detection rate and a false alarm rate. At the same time, MSDDM had the highest classification accuracy in most cases compared with other methods on real-world datasets. Experimental results show that MSDDM can detect concept drift in data streams with high drift detection performance and great spatiotemporal efficiency.

Table and Figures | Reference | Related Articles | Metrics
Ensemble classification algorithm based on dynamic weighting function
Le WANG, Meng HAN, Xiaojuan LI, Ni ZHANG, Haodong CHENG
Journal of Computer Applications    2022, 42 (4): 1137-1147.   DOI: 10.11772/j.issn.1001-9081.2021071259
Abstract401)   HTML12)    PDF (838KB)(99)       Save

In data stream ensemble classification, to make the classifiers adapt to the constantly changing data stream and adjust the weights of base classifiers to select an appropriate set of classifiers, an ensemble classification algorithm based on dynamic weighting function was proposed. Firstly, a new weighting function was proposed to adjust the weights of the base classifiers, and the classifiers were trained with constantly updated data blocks. Then a weight function was used to make a reasonable selection of candidate classifiers. Finally, the incremental nature of decision tree was applied to the base classifiers, and the classification of data stream was realized. Through a large amount of experiments, it is found that the performance of the proposed algorithm is not affected by block size. Compared with AUE2 algorithm, the average number of leaves is reduced by 681.3, the average number of nodes is reduced by 1 192.8, and the average depth of the tree is reduced by 4.42. At the same time, the accuracy is relatively improved and the time-consuming is reduced. Experimental results show that the algorithm can not only guarantee the accuracy but also save a lot of memory and time when classifying data stream.

Table and Figures | Reference | Related Articles | Metrics
Survey of high utility pattern mining methods based on positive and negative utility division
Ni ZHANG, Meng HAN, Le WANG, Xiaojuan LI, Haodong CHENG
Journal of Computer Applications    2022, 42 (4): 999-1010.   DOI: 10.11772/j.issn.1001-9081.2021071268
Abstract345)   HTML38)    PDF (1254KB)(314)       Save

High Utility Pattern Mining (HUPM) is one of the emerging data science research contents. The unit profit and number of items in the transaction database are considered to extract more useful information. The utility value of each item is assumed to be positive by the traditional HUPM methods, but in practical applications, the utility values of some data items may be negative (for example, the profit value of the product is negative due to a loss), and the pattern mining with negative items is as important as the pattern mining with only positive terms. Firstly, the relevant concepts of HUPM were explained, and the examples of corresponding positive and negative utilities were given. Then, the HUPM methods were divided into positive and negative perspectives, among which the pattern mining methods with positive utility were further divided into dynamic and static database perspectives; the pattern mining methods with negative utility included priori-based, tree-based, utility list-based, and array-based key technologies. the HUPM methods were discussed and summarized from different aspects. Finally, the shortcomings of the existing HUPM methods and the next research directions were given.

Table and Figures | Reference | Related Articles | Metrics
Data center server energy consumption optimization algorithm combining XGBoost and Multi-GRU
Mingyao SHEN, Meng HAN, Shiyu DU, Rui SUN, Chunyan ZHANG
Journal of Computer Applications    2022, 42 (1): 198-208.   DOI: 10.11772/j.issn.1001-9081.2021071291
Abstract401)   HTML18)    PDF (1169KB)(121)       Save

With the rapid development of cloud computing technology, the number of data centers have increased significantly, and the subsequent energy consumption problem gradually become one of the research hotspots. Aiming at the problem of server energy consumption optimization, a data center server energy consumption optimization combining eXtreme Gradient Boosting (XGBoost) and Multi-Gated Recurrent Unit (Multi-GRU) (ECOXG) algorithm was proposed. Firstly, the data such as resource occupation information and energy consumption of each component of the servers were collected by the Linux terminal monitoring commands and power consumption meters, and the data were preprocessed to obtain the resource utilization rates. Secondly, the resource utilization rates were constructed in series into a time series in vector form, which was used to train the Multi-GRU load prediction model, and the simulated frequency reduction was performed to the servers according to the prediction results to obtain the load data after frequency reduction. Thirdly, the resource utilization rates of the servers were combined with the energy consumption data at the same time to train the XGBoost energy consumption prediction model. Finally, the load data after frequency reduction were input into the trained XGBoost model, and the energy consumption of the servers after frequency reduction was predicted. Experiments on the actual resource utilization data of 6 physical servers showed that ECOXG algorithm had a Root Mean Square Error (RMSE) reduced by 50.9%, 31.0%, 32.7%, 22.9% compared with Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, CNN-GRU and CNN-LSTM models, respectively. Meanwhile, compared with LSTM, CNN-GRU and CNN-LSTM models, ECOXG algorithm saved 43.2%, 47.1%, 59.9% training time, respectively. Experimental results show that ECOXG algorithm can provide a theoretical basis for the prediction and optimization of server energy consumption optimization, and it is significantly better than the comparison algorithms in accuracy and operating efficiency. In addition, the power consumption of the server after the simulated frequency reduction is significantly lower than the real power consumption, and the effect of reducing energy consumption is outstanding when the utilization rates of the servers are low.

Table and Figures | Reference | Related Articles | Metrics
Dynamic weighted ensemble classification algorithm based on accuracy climbing
Xiaojuan LI, Meng HAN, Le WANG, Ni ZHENG, Haodong CHENG
Journal of Computer Applications    2022, 42 (1): 123-131.   DOI: 10.11772/j.issn.1001-9081.2021071234
Abstract246)   HTML11)    PDF (992KB)(70)       Save

In the traditional ensemble classification algorithm, the ensemble number is generally set to a fixed value, which may lead to a low classification accuracy. Aiming at this problem, an accuracy Climbing Ensemble Classification Algorithm (C-ECA) was proposed. Firstly, the base classifiers was no longer replaced the same number of base classifiers with the worst performance, but updated based on the accuracy in this algorithm, and then the optimal ensemble number was determined. Secondly, on the basis of C-ECA, a Dynamic Weighted Ensemble Classification Algorithm based on Climbing (C-DWECA) was proposed. When the base classifier was trained on the data stream with different features, the best weight of the base classifier was able to be obtained by a weighting function proposed in this algorithm, thereby improving the performance of the ensemble classifier. Finally, in order to detect the concept drift earlier and improve the final accuracy, Fast Hoffding Drift Detection Method (FHDDM) was adopted. Experimental results show that the accuracy of C-DWECA can reach up to 97.44%, and the average accuracy of the proposed algorithm is about 40% higher than that of Adaptable Diversity-based Online Boosting (ADOB) algorithm, and is also better than those of other comparison algorithms such as Leveraging Bagging (LevBag) and Adaptive Random Forest (ARF).

Table and Figures | Reference | Related Articles | Metrics
Survey of high utility pattern mining on dynamic data
Zhihui SHAN, Meng HAN, Qiang HAN
Journal of Computer Applications    2022, 42 (1): 94-108.   DOI: 10.11772/j.issn.1001-9081.2021071290
Abstract297)   HTML24)    PDF (1668KB)(283)       Save

High Utility Pattern Mining (HUPM) provides details about items to let users make better economic decisions by considering the numbers of purchase and the unit profits of items. Since most HUPM algorithms are applied in static databases, which are inconsistent with real-world scenarios where data is constantly generated, HUIM algorithms on dynamic data have been proposed in recent years. Firstly, the HUPM algorithms on incremental data, data stream, dynamic deletion data and dynamic modification data as well as the integrated high utility patterns (such as high utility sequential patterns, average high utility patterns, and top-k high utility patterns) mining algorithms were summarized. Secondly, the algorithms that handled different types of data, including dynamic profit data, dynamic sequence data and other data types, were summed up. Thirdly, the HUPM algorithms were classified and summarized from the perspectives of data structure, pruning strategy, window model, advantages and disadvantages. Finally, aiming at the lack in the current research, the research directions of HUPM algorithm on dynamic data in the next step were proposed.

Table and Figures | Reference | Related Articles | Metrics